Revisiting Label Smoothing Regularization with Knowledge Distillation

نویسندگان

چکیده

Label Smoothing Regularization (LSR) is a widely used tool to generalize classification models by replacing the one-hot ground truth with smoothed labels. Recent research on LSR has increasingly focused correlation between and Knowledge Distillation (KD), which transfers knowledge from teacher model lightweight student penalizing their output’s Kullback–Leibler-divergence. Based this observation, Teacher-free (Tf-KD) method was proposed in previous work. Instead of real model, handcrafted distribution similar guide learning. Tf-KD promising substitute for except its hard-to-tune model-dependent hyperparameters. This paper develops new teacher-free framework LSR-OS-TC, decomposes into two components: Output (OS) Teacher Correction (TC). Firstly, LSR-OS extends KD regime applies softer temperature output softmax layer. smoothing critical stabilizing hyperparameters among different models. Secondly, TC part, larger proportion assigned uniform teacher’s right class provide more informative teacher. The two-component evaluated exhaustively image (dataset CIFAR-100, CIFAR-10, CINIC-10) audio GTZAN) tasks. results showed that can improve performance independently no extra computational cost, especially several deep neural networks where ineffective. further training boost component effectiveness our strategy. Overall, LSR-OS-TC practical substitution be tuned one directly applied other compared original method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Topic Distillation with Knowledge Agents

This is the second year that our group participates in TREC’s Web track. Our experiments focused on the Topic distillation task. Our main goal was to experiment with the Knowledge Agent (KA) technology [1], previously developed at our Lab, for this particular task. The knowledge agent approach was designed to enhance Web search results by utilizing domain knowledge. We first describe the generi...

متن کامل

Simple Square Smoothing Regularization Operators

Tikhonov regularization of linear discrete ill-posed problems often is applied with a finite difference regularization operator that approximates a low-order derivative. These operators generally are represented by banded rectangular matrices with fewer rows than columns. They therefore cannot be applied in iterative methods that are based on the Arnoldi process, which requires the regularizati...

متن کامل

Smoothing speech trajectories by regularization

The articulators of human speech might only be able to move slowly, which results in the gradual and continuous change of acoustic speech properties. Nevertheless, the so-called speech continuity is rarely explored to discriminate different phones. To exploit this, this paper investigates a multiple-frame MFCC representation (that is expected to retain sufficient time-continuity information) in...

متن کامل

Multi-Label Learning with Posterior Regularization

In many multi-label learning problems, especially as the number of labels grow, it is challenging to gather completely annotated data. This work presents a new approach for multi-label learning from incomplete annotations. The main assumption is that because of label correlation, the true label matrix as well as the soft predictions of classifiers shall be approximately low rank. We introduce a...

متن کامل

Sequence-Level Knowledge Distillation

Neural machine translation (NMT) offers a novel alternative formulation of translation that is potentially simpler than statistical approaches. However to reach competitive performance, NMT models need to be exceedingly large. In this paper we consider applying knowledge distillation approaches (Bucila et al., 2006; Hinton et al., 2015) that have proven successful for reducing the size of neura...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied sciences

سال: 2021

ISSN: ['2076-3417']

DOI: https://doi.org/10.3390/app11104699